iPAS Exam Preparation Notes - Operational Intelligence Analyst (Entry Level)

After completing two iPAS certifications, I noticed there is another exam for the Operational Intelligence Analyst certification on 2026/10/31. I decided to give it a try, although after reading through the material, I found it to be quite different from what I had imagined. As per my usual practice, after speed-reading the two official handouts, I had AI organize them into notes that I find easier to study. However, this subject lacks past exam papers, making it impossible to estimate potential scores. Furthermore, the handouts only cover basic content, and many exam questions fall within the syllabus but are not explicitly in the handouts. Honestly, this makes me feel a bit uneasy about taking the exam.

Basic Knowledge of Operational Intelligence

Operational Intelligence vs. Business Intelligence

Business Intelligence (BI) is the use of information technology by enterprises to organize actionable information from daily operational data, helping companies improve operational performance and competitiveness.

Operational Intelligence (OI) applies business intelligence to the dynamic systems within an enterprise's operational model. It goes beyond historical reporting, emphasizing fast, real-time, and integrated information flows, organizing key information and critical indicators from processes such as R&D, procurement, production, marketing, finance, accounting, and human resources into a basis for decision-making.

Concept	Focus	Typical Use
Business Intelligence (BI)	Collecting, integrating, analyzing, and presenting enterprise data.	Reports, dashboards, trend analysis, KPI tracking
Operational Intelligence	Integrating BI analysis into enterprise operational processes to support real-time and multi-dimensional decision-making.	Production planning, sales analysis, financial analysis, HR performance
Big Data Analytics	Processing structured, semi-structured, and unstructured data; data-driven.	Social media analysis, customer behavior prediction, smart manufacturing

Operational Intelligence System Architecture

Operational Intelligence systems typically consist of three layers: the Data Layer, the Integration Layer, and the Analytics Layer. Data is first generated by various operational systems, then enters a data warehouse or data platform for integration, and is finally used for decision-making through analytics engines and data presentation tools.

Layer	Description	Common Technologies
Data Layer	Data sources and operational systems.	ERP, CRM, SCM, POS, sensors, external data
Integration Layer	Cleaning, transforming, and integrating data from different sources.	Data warehouse, ETL, Metadata Repository
Analytics Layer	Analyzing data and creating reports.	OLAP, data mining, statistical analysis, BI tools

KPIs and the Balanced Scorecard

A Key Performance Indicator (KPI) is a metric used to measure whether an organization is moving toward its goals. One of the tasks of operational intelligence is to identify KPIs for departments, processes, and the overall organization, allowing managers to review, measure, and correct action plans.

The Balanced Scorecard (BSC) is a strategic management method that translates vision and strategy into specific indicators across four perspectives.

Perspective	Focus Question	Example KPI
Financial	Does it contribute to shareholders and financial results?	Revenue growth rate, gross margin, ROI
Customer	How do customers view the enterprise?	Customer satisfaction, renewal rate, complaint rate
Internal Process	Which processes must perform well?	Delivery fulfillment rate, yield rate, process cycle time
Learning and Growth	Does the organization have the ability to improve continuously?	Employee training hours, skill coverage, number of innovation proposals

There is a causal relationship between the four perspectives of the Balanced Scorecard. Learning and growth serve as the foundation; once internal processes improve, they drive customer satisfaction, which is ultimately reflected in financial results. This causal structure is called a Strategy Map.

Applications of Operational Intelligence in Business Management

Operational Intelligence can be applied to enterprise functions such as production, sales, finance and accounting, and human resources. The key is to transform data from various processes into information that supports management actions.

Enterprise Function	Operational Intelligence Use	Corresponding System
Production Planning and Control	Tracking production quantities, scheduling, capacity, and actual vs. planned variances.	ERP, MRP, MES
Sales and Distribution Analysis	Analyzing customer, order, channel, promotion, and delivery data.	ERP, CRM, SCM, PLM
Finance and Accounting	Integrating operational processes and financial data to support management and financial accounting.	ERP Finance/Accounting Module
Human Resources	Quantifying HR performance and tracking human resource utilization status.	HRIS, BI dashboards

Taking sales and distribution as an example, this process often involves multiple systems simultaneously, illustrating how they divide labor.

System	Role in Sales and Distribution
ERP	Stores sales activities, orders, delivery, shipping, invoicing, and master data.
CRM	Manages customer attributes, marketing activities, promotions, and channel interaction data.
SCM	Supports sales forecasting, distribution networks, inventory, transportation, and order allocation.
PLM	Manages product status, features, and lifecycle information, supporting marketing and pricing.

For the complete definition of each system mentioned in this section, see Common Information Systems for Digital Enterprises later in this document.

Evaluating and Planning Operational Intelligence

Before introducing operational intelligence, one must evaluate whether it can solve current operational problems or support future business directions. Only after evaluation does the project move into planning, confirming whether the enterprise has sufficient technical and non-technical infrastructure.

Stage	Focus	Description
Operational Assessment	Identify operational problems to be solved.	E.g., reducing inventory, improving delivery times, increasing customer satisfaction.
Cost-Benefit Assessment	Compare benefits with implementation costs.	Benefits can include increased revenue, reduced costs, and increased market share.
Risk Assessment	Evaluate project risks.	Technical risks, project complexity, organizational risks, team risks.
Infrastructure Assessment	Check implementation conditions.	Hardware, middleware, databases, processes, data, application systems, metadata.
Project Planning	Establish plans and schedules.	WBS, task dependencies, resources, critical path, duration.

Enterprise value can be measured in four directions.

Question	Measurement Direction	Example
Better?	Has quality or satisfaction improved?	Higher yield, higher customer satisfaction
Cheaper?	Have costs been reduced?	Decreased development, communication, and inventory costs
Faster?	Has speed increased?	Production efficiency, time-to-market, service response speed
Do more?	Has capability or scope expanded?	New customer sources, market share, number of system users

Gantt Charts, PERT, and CPM

Operational intelligence projects can be planned, scheduled, and controlled using Gantt charts, PERT, and CPM.

Tool	Suitable Use	Focus
Gantt Chart	Simple project scheduling and progress monitoring.	Shows activity time and progress, but is not good at showing activity dependencies.
PERT	Estimation when task duration is uncertain.	Uses three time estimates: optimistic, most likely, and pessimistic.
CPM	Finding the shortest project completion time and critical path.	Activities on the critical path directly affect the project completion date.

Common steps for PERT and CPM:

Define the project and create a Work Breakdown Structure (WBS).
Identify all activities and their sequence.
Draw an activity network diagram.
Assign time and cost estimates to each activity.
Calculate the longest time path in the entire network.
Use the network diagram to assist in planning, scheduling, and controlling the project.

PERT Expected Time Formula

Given optimistic time $O$ , most likely time $M$ , and pessimistic time $P$ , the PERT expected time is:

T E = \frac{O + 4 M + P}{6}

The formula is essentially a weighted average, with the numerator being $O \times 1 + M \times 4 + P \times 1$ . 4 is the weight of the most likely time (making it dominate the estimate), and 6 is the sum of the weights ( $1 + 4 + 1$ ). Both are fixed constants, independent of the number of activities or the shape of the network diagram.

The variance is:

σ^{2} = {(\frac{P - O}{6})}^{2}

Taking the activity "AI Customer Service Model Training" as an example, the engineer estimates 2 days optimistic (everything goes smoothly), 5 days most likely (normal progress), and 14 days pessimistic (serious errors):

T E = \frac{2 + 4 \times 5 + 14}{6} = \frac{36}{6} = 6 days

σ^{2} = {(\frac{14 - 2}{6})}^{2} = 2^{2} = 4, σ = 2 days

Although 5 days is the most frequent occurrence, the pessimistic time is as long as 14 days, so the weighted expected time is pulled to 6 days, which is the number filled into the network diagram. The standard deviation of 2 days is an input for subsequent risk analysis. However, to answer probability questions like "What is the confidence level of completing within 8 days?", one cannot simply apply a normal distribution to a single activity (a single activity duration approximates a Beta distribution); the standard approach is to first aggregate the means and variances of all activities along the critical path into the completion distribution of the entire project. After summing multiple activities, it approaches a normal distribution according to the Central Limit Theorem, and then the probability is estimated. If the project completion average is 6 days and the standard deviation is 2 days, 8 days is exactly +1 standard deviation, with a probability of about 84% (50% below the mean, plus 34% for 0 to +1σ; see Descriptive Statistics and Common Statistical Concepts for the breakdown).

PERT/CPM Network Diagram Illustration

Each node in the diagram represents an activity (A is start, F is end), and the arrows represent the dependency relationship between activities. The pointed-to activity must wait for all preceding activities to complete before it can start. The number of days on the node is the expected time calculated for each activity using the PERT formula. Summing the days for each path:

A→B→D→F: 2 + 4 = 6 days
A→C→D→F: 3 + 4 = 7 days
A→C→E→F: 3 + 5 = 8 days

The longest path, A→C→E→F, is the critical path (red nodes). The project only ends when all paths are completed, so the total duration is determined by the longest path, which is 8 days. The "shortest completion time" mentioned by CPM refers precisely to the length of this longest path; it cannot be rushed any faster.

Activities on the critical path have no slack. If C or E is delayed by one day, the total duration increases by one day. Activities on non-critical paths have Float. Taking B as an example, D must wait for both B and C to complete. Since D takes 4 days, it can start as late as the 4th day and still finish synchronously with E on the 8th day. Therefore, B can be delayed by up to 2 days without affecting the project duration; exceeding this will push the project back.

Basic Data Analysis

Data Processing Chain

Data must go through a series of processes to transform from raw records into information that can be used for decision-making. The typical flow is that data first enters a database, is then organized into a data warehouse according to analysis purposes, then patterns are found through data mining, and finally, it is presented to decision-makers through visualization.

Stage	Focus
Data	Information that can be recorded, which may be numbers, text, images, or sound.
Database	Storing daily operational data.
Data Warehouse	Integrating data from multiple sources for analysis purposes.
Data Mining	Finding correlations, patterns, trends, or prediction rules.
Data Visualization	Presenting analysis results through charts, dashboards, or reports.

Data Sources and Datafication

Data is not necessarily just numbers; any information that can be recorded can become data. Enterprise data sources include transaction data, customer data, supply chain data, sensor data, customer service records, social media text, images, and video files.

Datafication is the process of converting phenomena that are not easily quantifiable into analyzable data. For example, converting customer service calls into text for sentiment analysis, or converting equipment vibration signals into time series to predict failures.

Structured, Semi-structured, and Unstructured Data

Data Type	Characteristics	Example	Pros	Limitations
Structured Data	Fixed fields, format, and order.	Sales data tables, member data tables	Easy to query, integrate, and analyze.	Low flexibility; data not conforming to field rules is hard to store.
Semi-structured Data	Has field concepts, but fields can be added/removed.	CSV, JSON, XML	Easy to exchange and extend.	Fields are not necessarily consistent; higher data governance requirements.
Unstructured Data	No fixed fields or fixed format.	Images, videos, Email, web pages, customer service recordings	Can preserve rich content.	Usually requires conversion before analysis; information may be lost during conversion.

Data Quality and Preprocessing

Data quality directly affects analysis results. Common data quality problems include missing values, duplicate data, inconsistent formats, inconsistent units of measurement, inconsistent field definitions, and outliers.

Problem	Description	Direction of Handling
Missing Values	Field not filled or unobtainable.	Imputation, deletion, marking the reason for missing.
Duplicate Data	The same entity is recorded repeatedly.	Deduplication, primary key comparison, merging records.
Inconsistent Format	Chaotic date, phone, address formats.	Standardizing formats.
Inconsistent Units	Mixing kilograms and grams, TWD and USD.	Unifying units and retaining conversion rules.
Inconsistent Definition	Different departments define the same field differently.	Establishing a data dictionary and common definitions.
Outliers	Outside reasonable range or violating rules.	Investigating the cause, correcting, or retaining and marking.

Common meanings of data quality:

Accuracy: Does the data reflect the true state?
Reliability: Are the data source and production process credible?
Consistency: Are definitions consistent across different systems or fields?
Completeness: Are necessary fields complete?
Relevance: Is the data relevant to the originally set business goals?

Missing Value Handling Methods

Handling missing values is one of the most common tasks in data preprocessing, and different methods are suitable for different scenarios.

Method	Description	Suitable Scenario
Listwise Deletion	Directly deleting the entire record containing missing values.	Very low missing ratio; deletion does not affect sample representativeness.
Pairwise Deletion	Excluding the record only when the field is used; other analyses still include it.	Multivariate analysis; hope to retain more data.
Mean/Median Imputation	Filling with the mean or median of the field.	Numerical fields; use mean for normal distribution, median for skewed distribution.
Linear Interpolation	Using known values before and after the missing point for linear interpolation.	Continuous data like time series; retains local trends, avoids flattening changes caused by mean imputation.
Mode Imputation	Filling with the most frequent category in the field.	Categorical fields.
Regression Imputation	Using other fields to build a regression model to predict missing values.	Missing field has a clear correlation with other fields.
KNN Imputation	Filling with the average of the K most similar records.	Clear multi-field features, sufficient samples.
Multiple Imputation	Generating multiple sets of possible values and merging analysis results.	High-quality research, complex missing mechanisms.
Retain and Mark	Retaining missing, adding an "is missing" flag field.	Missing itself has meaning (e.g., not filling is also information).

Data Storage: From Databases to Data Lakehouses

Concept	English	Characteristics
Database	Database	Stores daily operational (OLTP) data, highly normalized.
Data Warehouse	Data Warehouse	Integrates historical data from multiple operational systems, fixed Schema, for enterprise-level analysis.
Data Mart	Data Mart	A subset of a data warehouse, focusing on specific departments or themes (e.g., marketing, finance).
Data Lake	Data Lake	Stores structured, semi-structured, and unstructured data in raw format; Schema is defined only during analysis (Schema-on-Read).
Data Lakehouse	Data Lakehouse	Combines the management capabilities of a data warehouse with the flexibility of a data lake, featuring both ACID transactions and multi-format storage.

Database vs. Data Warehouse

Both store data, but serve opposite purposes:

Databases are "operational" oriented, storing current real-time data, frequent writes, and highly normalized to reduce redundancy (corresponds to OLTP).
Data warehouses are "analysis" oriented, integrating historical data from multiple sources, query-focused, and denormalized to speed up aggregation (corresponds to OLAP).

Databases answer "what is the current state," while data warehouses answer "what is the long-term trend."

ETL and ELT

The foundation of a data warehouse is ETL (Extract, Transform, Load), which is extracting data from source systems, cleaning and transforming it, and then loading it into the data warehouse.

The study guide also uses ECCD to describe the data processing flow:

Step	English	Description
Extract	Extract	Taking data from raw data sources.
Clean	Clean	Confirming data quality, handling missing, erroneous, and inconsistent data.
Conform	Conform	Unifying definitions, formats, and dimensions across data sources.
Delivery	Delivery	Delivering usable data to application systems or decision-makers.

When planning an ETL system, both requirements and architecture must be considered simultaneously.

Type	Consideration Items
Requirements Analysis	Business, regulations, quality, security, integration, schedule, backup, delivery, skills, resources
Architecture Design	Tool procurement or self-development, batch or streaming, schedule automation, exception handling, quality control, recovery and restart, metadata, data security

Differences between ETL and ELT

ETL and ELT consist of the same three steps, but the difference lies in the sequence of Transform and Load, which also changes where Transform is executed:

Step	ETL	ELT
Extract	Extract raw data from source systems.	Extract raw data from source systems.
Transform	Clean and apply business rules in external tools before loading.	Transform using the platform's computing power after loading.
Load	Finally, write the organized clean data into the data warehouse.	First, write raw data directly into the data lake or data lakehouse.

ETL is suitable for traditional data warehouses. Taking monthly financial closing as an example, currencies are unified, duplicate transactions are removed, and missing values are filled in an external tool before loading into the warehouse. Data quality is high, but the entire process must be rerun if business rules change.
ELT is suitable for data lakehouses and cloud data platforms. Taking an operational analysis platform as an example, order, POS, and customer service records are loaded in full, and then financial summaries, sales analysis, and customer segmentation datasets are generated according to needs. Raw data is kept intact, and when new analysis needs arise, one can go back and re-transform without being limited by the initial design.

Why ELT has risen

Early database storage costs were high, and computing and storage were tied to the same machine; transforming and reducing volume externally before loading was a necessary practice at the time.
Cloud object storage costs have dropped significantly, making full-volume loading feasible; modern cloud data platforms separate computing from storage, allowing for on-demand scaling of computing power for transformation within the platform.
ETL's aggregation and cleaning are destructive processes; once raw details are aggregated, they disappear. The need to retain raw data makes ELT a common choice on modern platforms.

OLTP and OLAP

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two different database operating modes for different purposes.

Aspect	OLTP	OLAP
Main Purpose	Processing daily transactions (add, modify, delete)	Analysis and decision support
Data Volume	Small per transaction, moderate total	Large amount of historical data
Operation Type	Large number of short transactions, frequent writes	Small number of complex queries, read-focused
Data Structure	Highly normalized	Denormalized (star, snowflake)
Typical System	ERP, POS, order systems	Data warehouse, BI platform

Five OLAP Operations

Multi-dimensional analysis can be imagined as operating on a Data Cube. Suppose this cube has three dimensions: time (year/month/day), location (country/city), and product. Each cell stores the sales amount for that combination. The five operations are different ways of viewing this cube.

Operation	English	Description and Example
Roll-up	Roll-up	Aggregating from fine to coarse along a dimension, e.g., summing daily sales of cities into monthly performance for the whole country.
Drill-down	Drill-down	The reverse of roll-up, expanding from coarse to fine, e.g., clicking from national performance to see details by city.
Slice	Slice	Fixing a single value for one dimension, taking a plane, e.g., looking only at "2026".
Dice	Dice	Setting limits on multiple dimensions, taking a smaller sub-cube, e.g., "2026 × Taipei × Mobile Phone".
Pivot	Pivot	Not changing data, only changing the presentation direction of dimensions, e.g., swapping report rows and columns to read from a different perspective.

The essence of the three sets of operations is different. Roll-up/Drill-down changes the granularity of data; Slice/Dice changes the scope of data; Pivot only changes the presentation angle.

Distinguishing Slice, Dice, and Drill-down

Slice vs. Dice: Both are filtering data; the difference is only in how many dimensions are fixed. Slice fixes one dimension and takes a plane (3D to 2D); Dice fixes multiple dimensions and takes a sub-cube.
Drill-down vs. Slice: Both are like narrowing the focus, but Drill-down goes down the same dimension to become finer (changing granularity), while Slice filters out other values of a dimension (changing scope), and the report will have one fewer dimension.

Star Schema and Snowflake Schema

Data warehouses often organize data using Dimensional Modeling, splitting data into two types of tables. The Fact Table is placed in the center, recording measurement numbers of "what happened," such as sales amount and purchase quantity; Dimension Tables surround it, recording the background of these numbers, such as when, where, and which product was purchased. The difference between star and snowflake schemas lies in how dimension tables are arranged.

Model	Characteristics	Trade-off
Star Schema	Center is the fact table, surrounded by directly connected dimension tables; dimension attributes are flattened in the same table.	Simple structure, few JOINs, fast queries, but dimension tables have duplicate data.
Snowflake Schema	Re-normalizes dimension tables, splitting duplicate attributes into multi-layer sub-dimension tables.	Saves storage, reduces redundancy, but more JOINs, slower queries.

Taking the product dimension as an example, the star schema flattens product name, category, and category manager into the same product dimension table. This hides a transitive dependency: product determines category, and category determines manager. The star schema tolerates this, so 1,000 electronic products will have the category and manager written 1,000 times. The snowflake schema extracts the category into an independent sub-table, eliminating this transitive dependency, effectively pushing the dimension table to the third normal form (3NF). The cost is an extra JOIN to look up the category name.

Additionally, normalization only affects dimension tables; the structure of the fact table is the same in both models. Fact tables only contain foreign keys and measurement values (e.g., product ID, customer ID, quantity, amount), no descriptive text, so there are no transitive dependencies to split.

Viewing Fact and Dimension Tables with Simulated Data

The sales fact table only has foreign keys and measurement values, no text descriptions:

Date ID	Product ID	Customer ID	Quantity	Amount
20260301	101	501	2	1000
20260301	102	502	1	30000
20260302	101	503	5	2500

The star schema's product dimension table flattens descriptive attributes; category and manager will repeat:

Product ID	Product Name	Category	Category Manager
101	Bluetooth Earphones	Electronics	Wang Xiaoming
102	Laptop	Electronics	Wang Xiaoming
103	Office Chair	Furniture	Li Dahu

The snowflake schema extracts the category into a sub-table, the product table only keeps the category ID, and "Electronics/Wang Xiaoming" no longer repeats:

Product ID	Product Name	Category ID
101	Bluetooth Earphones	C01
102	Laptop	C01
103	Office Chair	C02

Category ID	Category	Category Manager
C01	Electronics	Wang Xiaoming
C02	Furniture	Li Dahu

Star vs. Snowflake: Three Trade-off Aspects

Star and snowflake schemas each have their strengths. In practice, they are weighed across three aspects:

Query Speed: Star has fewer JOINs, faster; snowflake has more JOINs, slower.
Storage Space: Star tolerates duplication, takes more space; snowflake eliminates redundancy, saves more.
Maintainability: Snowflake keeps duplicate attributes in one place, update one record; star has the same attribute repeated in many rows, requiring all to be changed during updates, potentially causing inconsistencies (update anomalies).

However, in data warehouses, the weight of maintainability is lower than in transactional databases because warehouses are focused on read-based analysis and rarely modify historical data; the update burden caused by star duplication does not occur often. Therefore, most scenarios still prefer the star schema for its fast queries and intuitive SQL, and cloud platforms with low storage costs have further reinforced this choice.

Data Visualization and Chart Selection

The purpose of data visualization is to convey key points to decision-makers using appropriate charts without distorting the data.

The first step in choosing a chart is to think clearly about what relationship you want to express, then pick the corresponding chart.

Analysis Purpose	Suitable Chart	Description
Compare Categories	Bar chart, horizontal bar chart	Comparing quantities or sizes of different categories; horizontal bar charts are easier to read when there are many categories.
Show Trends	Line chart	Observing changes in values over time.
Show Composition	Pie chart, stacked bar chart	Seeing the proportion of parts in a whole; categories should not be too many.
Observe Distribution	Histogram, box plot	Histograms show the distribution shape of continuous values; box plots show median, quartiles, and outliers.
Observe Correlation	Scatter plot	Observing the relationship between two variables.
Show Cross Density	Heat map	Presenting the magnitude of 2D cross-values with color depth, e.g., sales heat by time period × region.

Bar Chart vs. Histogram

Bar charts are commonly used for qualitative data, such as product categories, regions, and departments.
Histograms are commonly used for quantitative data, such as amounts, height, and working hours.

Common Misleading Visualizations

The same data can give completely different impressions depending on how it is drawn. The following are four common misleading techniques.

Truncating the Y-axis: If the Y-axis does not start from zero, it will magnify原本微小的差距 (originally small differences). For example, with values 85 and 90, the two bars look almost the same height when the Y-axis is 0 to 100; after changing to 85 to 100, the height difference is severely magnified, and 90 looks far higher than 85. When comparing absolute quantities, the Y-axis should start from zero; when local differences need to be emphasized, the axis range must be clearly marked.

Comparison of bar charts for the same set of values (85 and 90) with Y-axis ranges 0-100 and 80-95; the truncated axis on the right magnifies the gap

3D Pie Chart: 3D perspective makes the sectors closer to the front appear larger than they actually are due to the angle, while the rear sectors are compressed, leading to distorted proportion interpretation. To show proportions, flat pie charts or bar charts are more accurate than 3D.

Four sectors each accounting for 25% are presented in 3D and flat pie charts; the 3D version's front sector looks larger than the rear, but the actual proportions are the same

Dual Y-axis: When two sets of data are hung on the left and right Y-axes respectively, the scales of the two axes can be adjusted independently. As long as the right scale is chosen, two lines that are originally unrelated can be made to look like they rise and fall together, implying a correlation that does not exist. To compare two sets of values, use the same baseline to see the true relationship.

Two unrelated time series; the left chart uses dual Y-axes to adjust the scales, making the two lines appear synchronized; the right chart uses a single baseline, revealing they are actually unrelated

Information Overload: Putting too many series or elements in one chart makes it impossible for the reader to grasp the key point. One chart should tell one story.

Descriptive Statistics and Common Statistical Concepts

Data analysis usually starts with descriptive statistics to understand the central tendency, dispersion, and distribution state of the data, and then proceeds to correlation analysis and hypothesis testing.

Metric	Description	Characteristics
Mean	Sum of all values divided by the number of records.	Easily affected by extreme values.
Median	The value in the middle after sorting.	Suitable for skewed distributions or when extreme values exist.
Mode	The value that appears most frequently.	Can also be used for categorical data.
Standard Deviation	Measuring the degree of data dispersion.	The larger the standard deviation, the more dispersed the data.
Quartiles	Cutting sorted data into four equal parts.	Box plots commonly use Q1, Q2, Q3.
Correlation Coefficient	Measuring the direction and strength of linear correlation.	No linear correlation does not mean no relationship; correlation does not mean causation.

Common statistical and probability concepts:

Conditional Probability: The probability of one event occurring given that another event has occurred.
Independent Events: The occurrence of one event does not affect the probability of another event occurring.
Confidence Interval: Estimating the possible range of population parameters using an interval.

Bayes' Theorem

A reverse application of conditional probability, using new evidence to update original probability judgments. General conditional probability answers "Given that he is a target customer, what is the probability he clicks the ad?"; Bayes' Theorem answers the reverse question, "Given that he clicked the ad, what is the probability he is a target customer?". In practice, most of the evidence that can be directly observed is the latter (clicks, purchases, revisits), so this reverse deduction is the direction commonly used in marketing and customer analysis.

Let's walk through it with concrete numbers. Suppose target customers account for only 10% of the exposed population, and the click-through rate of this ad for target customers is 60%, while it is only 5% for others. Estimating per 1,000 exposures, 60 clicks are expected from target customers (100 people × 60%) and 45 from others (900 people × 5%), totaling 105. Under the condition "given that someone clicked the ad," the probability that they are a target customer is 60 ÷ 105, about 57.1%. This result depends only on those three ratios (prior 10% and two click-through rates), regardless of how many clicks are actually recorded in the backend.

Hypothesis Testing, p-value, and Type I/Type II Errors

Hypothesis testing usually starts by establishing a null hypothesis, which is a conservative stance of "no difference, no effect," such as "the conversion rate of the new webpage is the same as the old version," and then seeing if the data is sufficient to overturn it.

If there is really no difference in conversion rates between the new and old versions, but such a large or even larger gap appears before our eyes, the probability of this situation occurring is the p-value. When the p-value is lower than the pre-set significance level (e.g., 0.05), it means this result is so low that it doesn't look like something that should happen when there is "no difference," thus overturning the null hypothesis. Note that the p-value is not "the probability that the null hypothesis is true," nor is it the size of the effect; these are two of the most common misinterpretations.

Errors can be made whether overturning or not, divided into two types:

Type I Error (False Positive): The null hypothesis is actually true but is overturned, misjudging "no effect" as "having an effect."
Type II Error (False Negative): The null hypothesis is actually false but is not overturned, misjudging "having an effect" as "no effect."

And the boundary of this significance level is essentially the upper limit of the Type I error probability one is willing to bear.

t-test, F-test, Chi-square test

Statistical testing uses sample data on hand to judge whether a hypothesis about the population is true; the preceding null hypothesis and p-value are its common judgment logic. Which test to use depends on the data type and the object to be compared. Common ones are the following three.

Test	Object of Comparison/Verification	Typical Question
t-test	Mean of one to two groups	Is there a difference in the average transaction amount between two groups of customers?
F-test	Variance of two groups (degree of data dispersion)	Is the quality fluctuation of two production lines consistent?
Chi-square test	Whether there is a correlation between two categorical variables	Is there a correlation between gender and whether they purchased?

ANOVA and F-test

To compare the means of three or more groups (e.g., whether the average revenue of three stores is the same), Analysis of Variance (ANOVA) is used, and its test statistic is the F-value. It is called "Analysis of Variance" but compares means because it compares "between-group variance" and "within-group variance." When the between-group difference is significantly larger than the random fluctuation within the group, it is judged that the means of the groups are not all equal.

Empirical Rule and Chebyshev's Theorem

Both answer the same question, "how much data is covered within ±k standard deviations of the mean," the difference being the requirement for the shape of the data.

The Empirical Rule only applies to data that is approximately normal (bell-shaped symmetric). Because the shape is known, the estimate can be very precise; about 68%, 95%, and 99.7% of the data is covered within ±1, 2, and 3 standard deviations, respectively.
Chebyshev's Theorem does not require any distribution shape. Because the shape is uncontrollable, it can only give a conservative lower bound; at least $1 - \frac{1}{k^{2}}$ of the data falls within ±k standard deviations of the mean.

The common 75% is calculated by substituting $k = 2$ , $1 - \frac{1}{2^{2}} = \frac{3}{4}$ ; $k = 3$ is $1 - \frac{1}{3^{2}} = \frac{8}{9}$ , about 88.9%.

So, for the same ±2 standard deviations, when data is approximately normal, one can say "about 95%"; when the shape is unknown or clearly skewed, one retreats to citing Chebyshev and says "at least 75%." The former is a precise estimate gained from a beautiful shape, and the latter is a minimum guarantee gained from not picking a shape.

Chebyshev's Theorem Calculation Example

The theorem can be used in two directions.

Given several standard deviations, find the minimum proportion: A factory's daily output averages 500 units, standard deviation 20 units. How much data is covered by at least 460 to 540 units? 460 and 540 are both 40 units from the mean, which is exactly 2 standard deviations ( $k = 2$ ). Substituting into $1 - \frac{1}{2^{2}} = \frac{3}{4}$ , so at least 75%.

Given the proportion to be guaranteed, find the interval: A distribution center averages 40 minutes for delivery, standard deviation 8 minutes. To guarantee that at least 84% of orders fall within a symmetric interval, how large should the interval be? Solving $1 - \frac{1}{k^{2}} = 0.84$ gives $k^{2} = 6.25$ , $k = 2.5$ . The interval is $40 \pm 2.5 \times 8 = [20, 60]$ minutes.

Normal Distribution and Skewness

The shape of the data distribution affects the choice of central tendency metrics and analysis methods.

Distribution Type	Shape	Relationship between Mean, Median, Mode	Typical Example
Symmetric (Normal)	Bell-shaped symmetric	The three coincide or are very close	Height, weight
Right-skewed (Positive)	Long tail on the right, peak on the left	Mode < Median < Mean	Income, housing prices, consumption amount
Left-skewed (Negative)	Long tail on the left, peak on the right	Mean < Median < Mode	Retirement age, high-score group in exams

When data is clearly skewed, the mean is pulled by extreme values at the tail, and the median is usually a more representative central tendency metric.

Data Mining and Machine Learning

Data mining is finding patterns, correlations, and knowledge that can support decision-making from large amounts of data. Common methods include decision trees, cluster analysis, association rule mining, and machine learning.

Method	Focus	Common Application
Decision Tree	Using a series of judgment conditions for classification or prediction.	Customer churn prediction, credit risk classification
Cluster Analysis	Dividing similar data into groups.	Market segmentation, customer segmentation, product portfolio
Association Rule Mining	Finding relationships where items often appear together.	Market basket analysis, cross-selling
Regression Model	Predicting continuous numerical targets.	Sales amount prediction, demand prediction
Classification Model	Predicting discrete categorical targets.	Whether to churn, whether to default, whether to purchase

Machine learning can be divided into three categories.

Type	Data Characteristics	Goal	Example
Supervised Learning	Labeled data.	Learning the mapping from input to answer.	Classification, regression
Unsupervised Learning	No manually labeled answers.	Exploring potential structures in data.	Clustering, dimensionality reduction, association exploration
Reinforcement Learning	Learning through action and feedback.	Finding strategies that accumulate better rewards.	Dynamic pricing, path planning, resource scheduling

Regression predicts continuous values (e.g., tomorrow's sales), and classification predicts discrete categories (e.g., whether a customer will churn). Both belong to supervised learning, the difference being the type of target variable.

Data Splitting and Overfitting

When building a model, data is usually split into three parts to prevent the model from "rote memorizing" training data but failing to generalize to new data.

Split	English	Purpose
Training Set	Training Set	Letting the model learn parameters from it, about 60-70%.
Validation Set	Validation Set	Adjusting hyperparameters, selecting model versions, about 15-20%.
Test Set	Test Set	Final evaluation of generalization ability, used only once, about 15-20%.

Phenomenon	English	Characteristics	Countermeasure
Overfitting	Overfitting	Good performance on training set, poor on test set; model rote memorizes noise.	Regularization, simplifying model, increasing data, early stopping, cross-validation.
Underfitting	Underfitting	Poor performance on both training and test sets; model is too simple.	Increasing features, increasing model complexity, reducing regularization.

Cross-Validation splits training data into K parts, each time using 1 part for validation and the remaining K-1 parts for training, averaging the results after repeating K times, commonly K=5 or K=10. It can estimate model performance more robustly when data volume is limited.

Regression Model Evaluation Metrics

Regression models commonly use several metrics to measure prediction error, the difference being sensitivity to "large errors" and ease of interpretation.

Metric	Full Name	Focus
MAE	Mean Absolute Error	Average of absolute errors; treats every error equally, less affected by extreme values.
RMSE	Root Mean Squared Error	Error is squared then square-rooted; magnifies large errors, used when extreme prediction errors must be severely punished.
MAPE	Mean Absolute Percentage Error	Expresses error as a percentage; can compare across different magnitudes, but distorts when actual values are close to 0.
R²	Coefficient of Determination	Proportion of overall variance explained by the model; looks at overall fit, not a penalty for single-record errors.

Classification Model Evaluation Metrics

Evaluation of classification models starts from the Confusion Matrix, dividing prediction results into four cells: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN), and then calculating metrics from them.

Metric	English	Algorithm	Question Answered
Accuracy	Accuracy	(TP + TN) ÷ Total	How many were predicted correctly overall? Distorts when classes are unbalanced.
Precision	Precision	TP ÷ (TP + FP)	Of those predicted positive, how many are actually positive?
Recall	Recall	TP ÷ (TP + FN)	Of those actually positive, how many were caught?
F1 Score	F1 Score	Harmonic mean of precision and recall	Comprehensive metric when both need to be balanced.

Precision and recall usually pull against each other, with trade-offs based on error costs. For example, fraud detection has a high cost for missing, so it favors recall; marketing list mis-sending has a low cost but harasses customers, so it favors precision.

CRISP-DM Data Mining Standard Process

CRISP-DM (Cross Industry Standard Process for Data Mining) is a data mining methodology widely adopted in the industry, containing six stages, which can backtrack to the previous stage as needed.

Data Mining Process Methodology Comparison

Besides CRISP-DM, KDD (Knowledge Discovery in Databases) and SEMMA are also common data mining processes. KDD appeared earlier and is more academic; SEMMA was proposed by SAS, with the name taken from the initials of the five stages: Sample, Explore, Modify, Model, Assess. The goals of the three are similar, the difference being in stage division and orientation.

Methodology	Stages	Orientation
CRISP-DM	Business Understanding → Data Understanding → Data Preparation → Modeling → Evaluation → Deployment	Business-oriented; starts with business goals, ends with deployment.
KDD	Selection → Preprocessing → Transformation → Data Mining → Interpretation and Evaluation	Data technology perspective, academic-oriented.
SEMMA	Sample → Explore → Modify → Model → Assess	Tool operation level, lacks business understanding and deployment stages.

Business Management Basic Knowledge

Business Environment and Strategic Management

Enterprises must evaluate the environment because organizational strategy, organizational structure, and market performance are all affected by the environment. Chandler's linkage theory can be simplified as "Environment → Strategy → Structure."

Perspective	Description
Strategic Perspective	Organizations adopt different strategies in stages such as startup, growth, maturity, and decline.
Market Perspective	Organizations turn inputs into outputs and must master the market to survive and grow.
Competitive Perspective	Enterprises must provide outputs more attractive than competitors and continue to profit.

The enterprise environment can be divided into internal and external environments.

Type	Factors
Internal Environment	Shareholders, board of directors, organizational culture, organizational structure, employee attitudes and values, management procedures and methods
Specific Environment	Customer groups, suppliers, competitor groups, financial institutions, shareholder groups, government, pressure groups
General Environment	Economy, politics and law, social culture, technology, population structure, natural ecology, internationalization

PEST and PESTEL Analysis

PEST/PESTEL is a common framework for analyzing the general environment, systematically taking stock of external influencing factors from multiple aspects.

Abbreviation	Aspect	Content
P	Political	Government policies, political stability, trade agreements.
E	Economic	GDP, interest rates, exchange rates, inflation, consumption power.
S	Social	Population structure, culture, lifestyle, education level.
T	Technological	Technology trends, R&D investment, automation, intellectual property.
E	Environmental	Climate change, sustainability regulations, energy, pollution (added in PESTEL).
L	Legal	Labor laws, consumer protection, fair trade, industry regulations (added in PESTEL).

PEST is the original version, and PESTEL (also called PESTLE) adds environmental and legal aspects, responding to the increasing importance of sustainable development and regulatory compliance issues.

Competition and Strategy Analysis Frameworks

Porter's Five Forces analysis is used to evaluate industry competitive pressure. All five forces are related to the enterprise's specific environment.

Five Forces	Core Question	Situation where this force is stronger
Supplier Bargaining Power	Can suppliers raise prices or lower supply conditions?	Suppliers are concentrated, few alternative sources, high switching costs, suppliers can integrate forward.
Customer Bargaining Power	Can customers lower prices or demand more services?	Buyers are concentrated or purchase in large volumes, products are standardized, low switching costs, buyers can integrate backward.
Threat of New Entrants	Is it easy for new entrants to enter the market?	Low entry barriers, such as economies of scale, brand, capital, channels, and regulatory barriers are all not high.
Threat of Substitutes	Can other products or services replace existing needs?	Substitutes have high price-performance ratios, low customer switching costs.
Rivalry Among Existing Firms	Is competition within the industry intense?	Many competitors of similar strength, industry growth is slowing, fixed or exit barriers are high, product differences are small.

The four arrows in the diagram point to the central rivalry among existing firms because the other four forces affect the intensity of competition within the industry. However, rivalry among existing firms is also one of the five forces; the five together determine the competitive intensity and profit space of the entire industry.

Porter's Three Competitive Strategies

After completing the Five Forces analysis, enterprises can choose corresponding competitive strategies based on the industry situation.

Strategy	English	Focus	Suitable Scenario
Cost Leadership	Cost Leadership	Providing the lowest cost at the same quality, winning through economies of scale and efficiency.	Mass market, price-sensitive customers.
Differentiation	Differentiation	Providing products or services with unique value, customers are willing to pay a premium.	Brand, design, technology, or service has differentiation potential.
Focus	Focus	Targeting a specific market segment, adopting cost or differentiation strategies within that segment.	Niche market, specific group, or geographic area.

SWOT analysis is used to take stock of internal and external conditions.

Type	Source	Description
Strengths	Internal	Favorable conditions possessed by the organization.
Weaknesses	Internal	Deficiencies or limitations within the organization.
Opportunities	External	Favorable factors in the external environment.
Threats	External	Unfavorable factors in the external environment.

TOWS Matrix (Strategic Deduction of SWOT)

SWOT only takes stock of the status quo; the TOWS Matrix further crosses internal and external factors to produce four strategic directions.

Strategy	Combination	Direction
SO Strategy (Offensive)	Strengths + Opportunities	Use internal strengths to seize external opportunities.
WO Strategy (Reinforcement)	Weaknesses + Opportunities	Reinforce internal weaknesses to seize opportunities.
ST Strategy (Defensive)	Strengths + Threats	Use strengths to resolve external threats.
WT Strategy (Retreat)	Weaknesses + Threats	Downsize, transform, or exit to avoid double pressure.

The BCG Matrix analyzes business units using market growth rate and relative market share.

Type	Market Growth Rate	Relative Market Share	Management Implication
Star	High	High	Needs investment to maintain growth.
Cash Cow	Low	High	Maturity stage, stable cash flow.
Question Mark	High	Low	Need to evaluate whether to increase investment to expand market share, otherwise abandon/exit.
Dog	Low	Low	Usually need to downsize, transform, or exit.

Ansoff Growth Matrix

The Ansoff Matrix deduces four growth strategies from the new/old combinations of the two dimensions of "product" and "market."

Strategy	Product × Market	Description
Market Penetration	Existing Product × Existing Market	Increase purchase frequency of existing customers or market share.
Market Development	Existing Product × New Market	Expand existing products to new regions or new customer groups.
Product Development	New Product × Existing Market	Launch new products or new features for existing customers.
Diversification	New Product × New Market	Highest risk, stepping into new fields.

Manufacturing vs. Service Industries

Aspect	Manufacturing	Service
Output	Tangible products.	Intangible services.
Management Focus	Cost, quality, customization, and production efficiency.	Customer experience, process quality, and service consistency.
Quality Recognition	Easier to measure with objective specifications.	Often affected by interaction context and customer subjective feelings.
Customer Participation	Customers are usually not present during production.	Service provision and consumption often occur simultaneously.

Service industries have four common characteristics:

Intangibility: Services cannot be touched like physical products.
Inseparability: Service provision and consumption often occur simultaneously.
Variability: Service quality is affected by personnel, customers, and context.
Perishability: Unused service capacity cannot be stored.

Marketing and Product Management

Marketing is about creating exchange value, satisfying customer needs, and supporting enterprise profit. Common themes include marketing mix, consumer behavior, purchasing procedures, market segmentation, and product lifecycle.

Marketing Mix 4Ps

4P	Description
Product	Goods, services, brand, and added value provided to customers.
Price	Pricing strategy, discounts, and payment terms.
Place	How the product reaches the customer.
Promotion	Advertising, sales promotion, public relations, and sales communication.

4Ps Corresponding to 4Cs (Customer Perspective)

4Ps are from the enterprise perspective; 4Cs were proposed by Robert Lauterborn, reinterpreting the marketing mix from the customer's angle.

4P	4C	Perspective Conversion
Product	Customer Value	From "what product we make" to "what value the customer needs."
Price	Cost	From "how much to price" to "total cost paid by the customer."
Place	Convenience	From "where to stock" to "is it convenient for the customer to obtain."
Promotion	Communication	From "one-way advertising" to "two-way communication."

STP Marketing Strategy

STP is the core process of marketing strategy, connecting market analysis and marketing mix design.

Step	English	Description
Segmentation	Segmentation	Dividing the market into different segments based on geographic, demographic, psychological, behavioral, and other variables.
Targeting	Targeting	Selecting the target customer group the enterprise wants to serve from the segments.
Positioning	Positioning	Establishing a differentiated brand impression and value proposition in the minds of the target customer group.

Purchaser Decision Consumption Procedure

Need generation.
Information collection.
Evaluation of need solutions.
Purchase decision.
Post-purchase satisfaction evaluation.

Factors Influencing Purchasing Decisions

Consumer purchasing decisions are affected by multi-level factors, and marketing strategies need to be designed based on the dominant factors of the target customer group.

Level	Factors
Cultural Factors	Culture, subculture, social class.
Social Factors	Reference groups, family, roles, and status.
Personal Factors	Age, occupation, economic status, lifestyle, personality.
Psychological Factors	Motivation, perception, learning, beliefs, and attitudes.

Product Lifecycle

Stage	Characteristics	Management Focus
Introduction	Market acceptance is still low, sales volume is limited.	Build awareness, lower entry barriers.
Growth	Sales volume increases rapidly, new users join, profit climbs simultaneously.	Expand market, increase market share.
Maturity	Market is stable, sales reach peak, but competition intensifies, profit has passed its peak and started to decline.	Control costs, differentiation, maintain cash flow.
Decline	Demand drops, product may be replaced.	Decide to eliminate, transform, or maintain niche market.

Plotting the sales volume and profit curves for the four stages shows that the profit peak usually occurs earlier than the sales volume peak. In the latter part of the growth stage, competition is not yet intense, and costs have decreased with scale expansion, so profit hits the top first; after entering the maturity stage, although sales volume climbs to the highest, competition intensifies, prices and gross margins decline, and profit has already started to fall back.

Product Lifecycle Curve

Product Diffusion: Types of Innovation Adopters (Rogers' Diffusion of Innovations)

Everett Rogers divided consumers into five types based on the timing of adopting new products, presenting a normal distribution.

Type	English	Proportion	Characteristics
Innovators	Innovators	2.5%	Willing to try new things, tolerate high risk and imperfection.
Early Adopters	Early Adopters	13.5%	Opinion leaders, influence subsequent group adoption decisions.
Early Majority	Early Majority	34%	Adopt after deep thought, value practicality and word-of-mouth.
Late Majority	Late Majority	34%	Follow up only after the majority have adopted, more conservative.
Laggards	Laggards	16%	Tradition-oriented, accept or reject adoption last.

Research & Development and Human Resources

Research & Development (R&D) is an important activity for enterprises to maintain innovation and competitiveness, containing two major aspects: research and development.

Type	Description
Basic Research	Developing and verifying theories; results may or may not have practical application value.
Applied Research	Turning basic research results toward concrete applications and problem-solving.
Development and Engineering Design	Systematically applying knowledge and research results to products or services.
Experimental Production	Small-batch trial production to confirm design, molds, materials, quality, and delivery issues.

R&D has non-routine nature and high risk. Investing resources does not guarantee the generation of patents, products, or market success, and one must face tests of mass production, marketing, channels, and customer acceptance.

Human resource management includes activities such as recruitment and selection, training and development, personnel movement, performance evaluation, rewards, and human resource retention.

Activity	Focus
Human Resource Planning	Understanding existing human resource supply, evaluating future demand, planning supply-demand gaps.
Selection	Finding suitable candidates through resumes, interviews, written tests, and background checks.
Employee Training	Stimulating self-learning, practice-oriented, employee participation, real-time feedback.
Performance Evaluation	Identifying performance problems, consequences, and improvement plans.
Compensation and Benefits	Includes monetary rewards and non-monetary rewards.

Accounting, Finance, and Financial Metrics

The difference between accounting and financial management lies in the time perspective and work focus.

Item	Accounting	Financial Management
Time Perspective	Focused on historical information organization and reporting.	Focused on future fund planning and financial decisions.
Work Focus	Recording, aggregating, reporting transaction activities.	Investment, financing, fund utilization, and risk management.
Users	Internal managers and external information users.	Managers, investors, creditors, etc.

Metric	Type	Description
Marginal Profit Rate	Profitability	Measuring how much profit is left per unit of sales.
Return on Investment (ROI)	Profitability	Measuring the return brought by investment.
Earnings Per Share (EPS)	Profitability	Earnings corresponding to each share of common stock.
Current Ratio	Solvency	Measuring short-term debt repayment ability, not a profitability metric.
Inventory Turnover	Operational Efficiency	Measuring the speed of inventory conversion.

EOQ Economic Order Quantity Formula

Given annual demand $D$ , ordering cost per order $S$ , and holding cost per unit $H$ , the economic order quantity is:

E O Q = \sqrt{\frac{2 D S}{H}}

EOQ is the single order quantity that minimizes the sum of "ordering costs" and "holding costs," assuming stable demand, fixed lead time, and immediate replenishment.

Management Activities and Organization

Management is the process by which managers use organizational resources to achieve goals.

Achieving goals and using resources correspond exactly to the two aspects of measuring management performance:

Effectiveness: Whether the goal is appropriate, whether the right things are done.
Efficiency: The relationship between input and output, whether results are achieved with fewer resources.

Managers can be divided into first-line managers, middle managers, and top managers by level.

Different levels require different proportions of skills. First-line managers focus on technical skills, top managers focus on conceptual skills, and interpersonal skills are important at all levels.

Manager Skills	Description	Commonly Found In
Technical Skills	Using professional knowledge to perform work.	First-line managers need more.
Interpersonal Skills	Communication, cooperation, motivation, and leadership.	All levels need.
Conceptual Skills	Analyzing problems, evaluating plans, planning actions.	Top managers need more.
Political Skills	Building power bases and interest linkages.	Important when involving cross-departmental coordination.

Mintzberg's managerial roles can be divided into three categories.

Type	Role
Interpersonal Roles	Figurehead, leader, liaison
Informational Roles	Monitor, disseminator, spokesperson
Decisional Roles	Entrepreneur, disturbance handler, resource allocator, negotiator

Four functions of enterprise management:

Function	Description
Planning	Defining goals, formulating strategies and plans.
Organizing	Allocating resources, arranging work, and establishing responsibility relationships.
Leading	Motivating, communicating, coordinating, and handling conflicts.
Controlling	Monitoring performance, comparing goal and actual gaps, taking corrective measures.

PDCA Cycle (Deming Cycle)

PDCA is the foundation cycle for quality management and continuous improvement, proposed by Walter Shewhart and promoted by W. Edwards Deming, often regarded as the execution model of the four management functions.

Stage	Corresponding Management Function	Focus
Plan	Planning	Defining problems, setting goals, formulating action plans.
Do	Organizing + Leading	Executing according to plan, recording processes and data.
Check	Controlling	Evaluating effectiveness, identifying gaps with goals.
Act	Controlling + Planning	If successful, standardize; if not, correct and restart the cycle.

The extended version PDSA (Plan-Do-Study-Act) replaces "Check" with "Study," emphasizing in-depth analysis rather than simple checking. Six Sigma's DMAIC (Define, Measure, Analyze, Improve, Control) is an advanced version more focused on quality improvement projects.

Digital Enterprise Information Tools Basic Knowledge

Digital Enterprise and Digital Transformation

A digital enterprise is one that can use digital technology and information networks to communicate, collaborate, conduct electronic transactions, share data, and transform processes with customers, suppliers, and business partners. The focus of digital transformation is not just introducing tools, but changing operational processes, service delivery methods, and value creation models.

Operational Intelligence Information Technology

Technology	Focus	Operational Intelligence Application
AI	Supporting analysis and decision-making through machine learning, deep learning, and NLP.	Customer behavior prediction, inventory risk analysis, social media sentiment analysis.
Generative AI	Automatically generating text, images, and code using Large Language Models (LLM) and diffusion models.	Marketing copy, customer service dialogue, report summary, SQL auto-generation, knowledge Q&A.
Cloud Computing	Obtaining computing, storage, and platform resources on-demand via the network.	Expanding data platforms, reducing peak resource procurement pressure.
RFID	Tracking and identifying objects through radio frequency signals.	Material tracking, inventory counting, automated warehouse entry/exit.
IoT	Connecting sensors, equipment, and networks.	Equipment monitoring, smart logistics, smart agriculture, environmental monitoring.
Big Data	Processing data with high volume, velocity, variety, and uncertainty.	Marketing analysis, production risk analysis, customer insight.

Big Data 5V Characteristics

V	Chinese	Description
Volume	量大	Data volume reaches TB, PB levels.
Velocity	速度快	High frequency generation and processing requirements.
Variety	多樣性	Structured, semi-structured, unstructured coexist.
Veracity	真實性	Data quality and credibility vary.
Value	價值	Finding parts in massive data that can be converted into business decisions.

The iPAS official study guide lists 4V (Volume, Velocity, Variety, Veracity) for big data characteristics; Value is an extension common in other textbooks, collectively called 5V.

AI / ML / DL / GenAI Scope Relationship

The diagram is a simplified scope inclusion relationship. ML is a subset of AI, DL is a subset of ML, and the current mainstream GenAI is also implemented using deep learning.

There are three standard cloud service models defined by NIST: IaaS, PaaS, and SaaS; the table below also lists FaaS, which is common in serverless architectures, though it is not one of the three NIST standards.

Service Model	Full Name	Supplier Management Scope	Customer Management Scope	Typical Service
IaaS	Infrastructure as a Service	Physical machine, network, storage, virtualization	OS, middleware, execution environment, application, data	AWS EC2, Azure VM, GCP Compute Engine
PaaS	Platform as a Service	IaaS scope + OS, middleware, execution environment	Application, data	Heroku, Azure App Service, AWS Elastic Beanstalk
FaaS	Function as a Service	PaaS scope + execution environment management, auto-scaling	Function code, trigger settings	AWS Lambda, Azure Functions, GCP Cloud Functions
SaaS	Software as a Service	PaaS scope + application	Data (user operation level)	Gmail, Microsoft 365, Salesforce

NIST also defines four deployment models, describing who the cloud infrastructure is for:

Deployment Model	Description
Public Cloud	Operated by cloud service providers, open for general public rental, multi-tenant shared resources.
Private Cloud	Dedicated to a single organization, can be self-built or outsourced, highest control and isolation.
Community Cloud	Shared by multiple organizations with common needs (e.g., regulations, security).
Hybrid Cloud	Combining two or more deployment models, e.g., using private cloud normally, scaling to public cloud during peaks.

RFID systems include electronic tags, card readers, and application systems. Electronic tags are divided into active and passive:

Type	Power Source	Characteristics
Active RFID	Built-in battery.	Can actively send data, sensing range is larger.
Passive RFID	No built-in battery.	Generates current through electromagnetic induction after receiving card reader signals, sensing range is smaller.

IoT three-layer architecture:

Level	Description
Perception Layer	Embedding components with sensing, identification, and communication capabilities into objects.
Network Layer	Receiving information and processing it through wired or wireless transmission.
Application Layer	Combining data with industry scenarios to provide specific services.

Organizational Levels of Information Systems

Enterprise information systems can be classified by the organizational level they serve, forming a pyramid structure from base-level daily transactions to high-level strategic decision-making. The higher the level, the more aggregated the data and the more unstructured the decisions faced.

System	Chinese	English Full Name	Service Level	Focus
TPS	Transaction Processing System	Transaction Processing System	Operational	Recording and processing daily operational transactions, such as orders, shipping, invoicing, payroll.
MIS	Management Information System	Management Information System	Management	Aggregating TPS transaction data, generating routine management reports and summaries.
DSS	Decision Support System	Decision Support System	Management	Supporting semi-structured decision-making with interactive models and analysis tools.
EIS	Executive Information System	Executive Information System	Strategic	Integrating internal/external information, supporting high-level strategic decisions with dashboards (also called ESS).
ES	Expert System	Expert System	Cross-level	Organizing specific domain expert knowledge into a rule base to assist judgment and diagnosis.

Structured, Semi-structured, and Unstructured Decisions

Decisions are divided into three types based on "whether there is a clear processing procedure," which is also the basis for information system layering:

Structured decisions: Clear rules and steps, repeatable, can even be automated, such as inventory replenishment, payroll calculation. Mostly handled by TPS/MIS.
Semi-structured decisions: Partially have rules, partially require human judgment, such as budget compilation, marketing resource allocation. This is the home field of DSS.
Unstructured decisions: Problems are novel, no ready-made procedures, highly dependent on experience judgment, such as new business investment, mergers and acquisitions. Mostly fall into the EIS at the strategic level.

The higher up the pyramid, the more unstructured the decisions become.

TPS is the data foundation for other systems; MIS and DSS both rely on transaction data accumulated by TPS. The difference between MIS and DSS is that the former generates fixed-format routine reports to answer "what happened," while the latter performs interactive analysis through models to answer "what if we do this." The functional systems in the next section, such as ERP, CRM, and SCM, span multiple levels, containing both transaction processing and report analysis capabilities internally.

Common Information Systems for Digital Enterprises

System	Chinese	English Full Name	Core Purpose
ERP	Enterprise Resource Planning	Enterprise Resource Planning	Integrating internal enterprise finance, manufacturing, production, sales, HR, and other processes and data.
MRP	Material Requirements Planning	Material Requirements Planning	Calculating raw material procurement and production needs based on production schedules and bills of materials.
MES	Manufacturing Execution System	Manufacturing Execution System	Connecting production schedules and on-site equipment, real-time grasp of work orders, output, and quality.
PLM	Product Lifecycle Management	Product Lifecycle Management	Managing data and processes of products from design, R&D, mass production to elimination.
SCM	Supply Chain Management	Supply Chain Management	Managing supply chain upstream and downstream planning and execution.
CRM	Customer Relationship Management	Customer Relationship Management	Managing customer data, interactions, sales channels, and service records.
EC	Electronic Commerce	Electronic Commerce	Supporting electronic transactions, information sharing, and relationship maintenance.
KM	Knowledge Management	Knowledge Management	Supporting storage, retrieval, creation, transfer, and application of organizational knowledge.
BI	Business Intelligence	Business Intelligence	Collecting, integrating, analyzing, and presenting data to support decision-making.
RPA	Robotic Process Automation	Robotic Process Automation	Automatically executing tasks with clear rules and high repetition using software robots.
HRIS	Human Resource Information System	Human Resource Information System	Managing recruitment, attendance, payroll, assessment, and personnel data.

ERP

ERP is a finance and accounting-oriented integrated information system used to plan, control, and integrate enterprise resources and information from order taking, manufacturing, shipping to settlement reports. It usually consists of integrated software modules and a centralized database.

Common ERP modules include:

Finance and Accounting.
Human Resources.
Manufacturing and Production.
Sales and Marketing.
Procurement and Inventory.
Supply Chain Management.

Manufacturing Information Systems: MRP, MES, and PLM

There are several information systems in the manufacturing field that often appear together, dividing labor and collaborating. ERP evolved from MRP (MRP → MRP II → ERP); MES and PLM are complementary systems connected to ERP, focusing on on-site execution and product/engineering data management respectively.

System	Positioning	Relationship with ERP
MRP	Calculating material procurement and production schedules based on master production schedules and bills of materials (BOM).	The predecessor of ERP, later expanded to MRP II covering capacity and finance, then evolved into ERP.
MES	Connecting ERP production plans and on-site equipment, real-time reporting of work order progress, output, and quality.	Fills the execution gap between ERP and on-site machines, and feeds actual production data back to ERP.
PLM	Managing engineering data and processes of products from design, R&D, mass production to elimination.	Provides product master data and BOM sources for ERP and MES to reference.

MRP solves "when to prepare how much material," MES solves "how the site is actually performing," and PLM solves "how to manage product data itself." After connecting with ERP, they constitute a complete information chain for manufacturing from order taking, material preparation, production to shipping.

"Predecessor" means absorbed, not replaced

In information systems, saying MRP is the "predecessor" of ERP means that MRP's functions are covered by the broader ERP and become one of its modules, not that MRP disappeared after being replaced by another system. Therefore, ERP itself contains MRP's material calculation functions, and independent MRP systems are almost never seen on the market.

MES and PLM coexist with ERP because the three have different scopes and levels, each performing its own duties, exchanging data through interfaces rather than replacing each other. So ERP and MRP have an "inclusion" relationship, while ERP and MES/PLM have a "partner" relationship.

SCM

The supply chain includes participants such as suppliers, manufacturers, retailers, and logistics in the process of products or services from raw materials to delivery to customers.

SCM systems can be divided into supply chain planning and supply chain execution:

Supply Chain Planning: Demand forecasting, material requirements, production planning, logistics distribution planning.
Supply Chain Execution: Managing the flow of products from distribution centers to warehouses and customers.

Supply chains can be divided into three types based on "what triggers production":

Type	Production Trigger	Trade-off
Push	Produce based on demand forecasts, then push to channels.	Inventory ready, fast shipping, but forecast errors cause surplus or shortages.
Pull	Produce or assemble only after receiving actual orders.	Low inventory, no overproduction, but customer waiting lead time is longer.
Push-Pull	Front end pushes based on forecasts, back end pulls based on orders.	Combines the efficiency of push with the flexibility of pull.

The key to push-pull is the "push-pull boundary point." Before the boundary point, common parts or semi-finished products are prepared based on forecasts (push), and after receiving orders, final assembly or customization is done according to customer specifications (pull). For example, computer manufacturers prepare standard parts, and after receiving orders, assemble them according to configurations; the front end enjoys scale efficiency, and the back end retains customization flexibility.

Electronic Commerce

Electronic commerce is business activity supported by ICT to share information, execute transactions, and maintain relationships between buyers and sellers.

Type	Full Name	Description
B2B	Business to Business	Business to business, both buyers and sellers are businesses, such as procurement platforms between enterprises and suppliers, raw material trading platforms.
B2C	Business to Consumer	Enterprises sell goods or services directly to consumers, such as online bookstores, brand official e-commerce.
C2B	Consumer to Business	Consumer-led, enterprise-responsive, such as group buying to gather bargaining power, consumers initiate needs for enterprises to undertake.
C2C	Consumer to Consumer	Transactions between consumers, platforms mediate, such as auction sites, second-hand trading platforms.
B2E	Business to Employee	Enterprises provide services, information, and goods to employees through internal platforms, such as employee portals, employee welfare or shopping platforms, online education and training.
O2O	Online to Offline	Online traffic or ordering, offline consumption or pickup, such as using online coupons in physical stores, food delivery and reservations.

E-commerce trends include socialization, mobilization, and localization. The positioning function of mobile devices allows service providers to provide location services closer to the context.

KM

Knowledge management systems use information technology to support the storage, retrieval, creation, transfer, and application of organizational knowledge.

Component	Description
Knowledge Base	Document libraries and information libraries, supporting capturing, organizing, storing, searching, and accessing knowledge.
Knowledge Map	Graphically presenting knowledge sources, storage locations, expert locations, tasks, and knowledge relationships.

The two have different divisions of labor. The knowledge base is a content warehouse for storing knowledge, while the knowledge map is an index for finding knowledge, marking where knowledge is scattered, who holds it, and pointing to the knowledge base and other sources (including tacit knowledge in people's minds).

Knowledge maps are divided into three types based on how they organize organizational knowledge:

Type	Organize Knowledge By
Conceptual Knowledge Map	Classified by concept or theme, presenting the hierarchy of knowledge and relationships between concepts.
Process Knowledge Map	Corresponds to enterprise processes, marking which knowledge is needed or produced by each process activity.
Competency Knowledge Map	Corresponds to personnel capabilities, presenting where experts are and the distribution of organizational knowledge capabilities.

BI

BI collects, integrates, analyzes, and presents enterprise data to support decision-making. Its operating mechanism can be divided into three levels:

Data integration and cleaning: Aggregating sources such as ERP, CRM, external data, etc., and standardizing them.
Multi-dimensional analysis and visualization: Converting data into charts, dashboards, or interactive reports.
Real-time monitoring and decision support: Providing alerts, scenario simulation, and KPI tracking.

CRM

CRM centers on customers, integrating data and processes of marketing, sales, and service. Main functions are divided into three categories:

Data integration and management: Centralized integration of customer contact information, purchase records, service records, and interaction history.
Marketing and sales support: Managing potential customers, sales pipelines, and marketing automation.
Service and interaction management: Recording customer service interactions, providing knowledge bases, and automated replies.

Three Types of CRM

Type	Focus
Operational	Automating front-line marketing, sales, and service processes, facing customers directly.
Analytical	Analyzing customer data collected by operational CRM to do segmentation, value evaluation, and behavior prediction.
Collaborative	Integrating customer interactions and communication across multiple channels such as phone, web, email.

Customer Lifetime Value (LTV)

Customer Lifetime Value (LTV) estimates the total profit a customer can bring during the entire period of interaction, which is the core metric for analytical CRM to measure customer value. Under the simplified model of subscription systems, LTV = Monthly Marginal Profit × Expected Duration in Months, where the expected duration in months can be estimated by the reciprocal of the monthly Churn Rate (1 ÷ Monthly Churn Rate).

LTV Calculation Example

A SaaS service has an average revenue per user (ARPU) of $30 per month, a marginal gross margin of 80%, and a monthly churn rate of 4%. The expected duration is 1 ÷ 0.04 = 25 months, the monthly marginal profit is 30 × 80% = $24, so LTV = 24 × 25 = $600.

RPA

RPA uses software robots to simulate human operations on system interfaces, automatically executing tasks with clear rules and high repetition. It is suitable for tasks such as data entry, file movement, report generation, and data conversion between systems. It is not equivalent to AI, but can be combined with AI to handle more complex document recognition and judgment processes.

Digital Transformation and Value Creation

A business model (BM) describes how an enterprise establishes and uses resources to provide valuable products or services to customers, thereby obtaining profits and creating enterprise value.

Aspect	Description	Elements
Value Proposition	Value commitment made by the enterprise to the target customer group.	Target customer group, product or service, customer value
Value Configuration	How internal and external resources form value creation processes.	Stakeholder network, key activities, customer relationships, distribution channels
Value Structure	Organizational, technical, and capability foundations supporting the business model.	Organizational structure, organizational culture, ICT, core resources, core capabilities
Value Finance	Cost, pricing, revenue, and profit logic.	Cost structure, pricing model, revenue structure, potential profit

The enterprise Value Chain is a combination of activities from raw material acquisition to product or service delivery to customers. Porter's value chain divides activities into primary activities and support activities.

Type	Definition	Included Activities
Primary Activities	Directly participate in the production and delivery of products or services, adding value to products link by link along the process.	Inbound logistics, operations and production processing, outbound logistics, marketing and sales, after-sales service
Support Activities	Do not directly produce products, but span and support all primary activities, providing the resources and foundations they need.	Procurement, human resource management, technology development, enterprise infrastructure

Simply put, primary activities are the linear process of products from material entry to after-sales, while support activities are the common foundation that horizontally supports this process. In the diagram below, primary activities are connected from left to right, and support activities span across them, which is exactly this structure.

BPR Business Process Re-engineering

Business Process Re-engineering (BPR) is rethinking the fundamental basis of an enterprise, thoroughly renovating operational processes, so that key performance indicators obtain significant improvement.

Four points of BPR:

Point	Description
Fundamental	Re-asking why the enterprise does this thing and why it does it this way.
Radical	Redesigning processes from the ground up, not limited by existing rules.
Dramatic	Pursuing significant improvement, not just small incremental improvement.
Process	Focusing on cross-departmental operational processes, not just looking at organizational structure.

The five-step model proposed by scholars Davenport and Short:

Establish enterprise vision and activity goals.
Identify processes that need redesign.
Understand existing processes.
Determine the function of information technology.
Build a prototype of the new process.

The eight-step model proposed by scholars Gunasekaran et al.:

Educate employees to recognize BPR.
Select and establish a BPR promotion team.
Evaluate existing processes and initially establish processes that need re-engineering.
Determine measurement methods.
Design re-engineered processes.
Education and training.
Simulation testing.
Officially introduce new processes.

Change Management: Lewin's Three-Stage Model

For significant changes like BPR to be successfully implemented, supporting change management is required. Kurt Lewin proposed the most classic three-stage change model.

Stage	English	Focus
Unfreezing	Unfreezing	Establishing a sense of urgency for change, breaking status quo inertia, reducing employee resistance.
Changing	Changing (Moving)	Promoting new processes and new practices, providing training and support during the period.
Refreezing	Refreezing	Institutionalizing new practices, incorporating into performance evaluation, forming new habits.

Kotter's Eight-Step Change Model is a more concrete implementation version:

Establish a sense of urgency.
Form a change promotion coalition.
Propose a vision and strategy for change.
Communicate the vision for change.
Empower employees to take action.
Create short-term wins.
Consolidate achievements and promote more change.
Embed new practices into organizational culture.

Change Log

2026-06-11 Initial document creation.

iPAS Exam Preparation Notes - Operational Intelligence Analyst (Entry Level) ​

Basic Knowledge of Operational Intelligence ​

Operational Intelligence vs. Business Intelligence ​

Operational Intelligence System Architecture ​

KPIs and the Balanced Scorecard ​

Applications of Operational Intelligence in Business Management ​

Evaluating and Planning Operational Intelligence ​

Gantt Charts, PERT, and CPM ​

Basic Data Analysis ​

Data Processing Chain ​

Data Sources and Datafication ​

Structured, Semi-structured, and Unstructured Data ​

Data Quality and Preprocessing ​

Data Storage: From Databases to Data Lakehouses ​

ETL and ELT ​

OLTP and OLAP ​

Data Visualization and Chart Selection ​

Descriptive Statistics and Common Statistical Concepts ​

Data Mining and Machine Learning ​

Business Management Basic Knowledge ​

Business Environment and Strategic Management ​

Competition and Strategy Analysis Frameworks ​

Manufacturing vs. Service Industries ​

Marketing and Product Management ​

Research & Development and Human Resources ​

Accounting, Finance, and Financial Metrics ​

Management Activities and Organization ​

Digital Enterprise Information Tools Basic Knowledge ​

Digital Enterprise and Digital Transformation ​

Operational Intelligence Information Technology ​

Organizational Levels of Information Systems ​

Common Information Systems for Digital Enterprises ​

ERP ​

Manufacturing Information Systems: MRP, MES, and PLM ​

SCM ​

Electronic Commerce ​

KM ​

BI ​

CRM ​

RPA ​

Digital Transformation and Value Creation ​

BPR Business Process Re-engineering ​

Change Log ​

Tags

Related Notes

iPAS Exam Preparation Notes - AI Application Planner

iPAS Exam Preparation Notes - Information Security Engineer

Elasticsearch Query DSL Query Syntax Notes

Elasticsearch QueryString Query Syntax Notes